{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chat with PDF using Azure AI\n",
    "\n",
    "This is a simple flow that allow you to ask questions about the content of a PDF file and get answers.\n",
    "You can run the flow with a URL to a PDF file and question as argument.\n",
    "Once it's launched it will download the PDF and build an index of the content. \n",
    "Then when you ask a question, it will look up the index to retrieve relevant content and post the question with the relevant content to OpenAI chat model (gpt-3.5-turbo or gpt4) to get an answer.\n",
    "\n",
    "## 0. Install dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Connect to Azure Machine Learning Workspace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential\n",
    "\n",
    "try:\n",
    "    credential = DefaultAzureCredential()\n",
    "    # Check if given credential can get token successfully.\n",
    "    credential.get_token(\"https://management.azure.com/.default\")\n",
    "except Exception as ex:\n",
    "    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n",
    "    credential = InteractiveBrowserCredential()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 Get familiar with the primary interface - PFClient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import promptflow.azure as azure\n",
    "\n",
    "# Get a handle to workspace\n",
    "pf = azure.PFClient.from_config(credential=credential)\n",
    "runtime = \"chat_with_pdf_runtime\"  # TODO add guide to create runtime\n",
    "# runtime = None # automatic runtime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 Create necessary connections\n",
    "\n",
    "Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).\n",
    "\n",
    "Prepare your Azure Open AI resource follow this [instruction](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal) and get your `api_key` if you don't have one.\n",
    "\n",
    "Please go to [workspace portal](https://ml.azure.com/), click `Prompt flow` -> `Connections` -> `Create`, then follow the instruction to create your own connections. \n",
    "Learn more on [connections](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/concept-connections?view=azureml-api-2)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn_name = \"open_ai_connection\"\n",
    "\n",
    "# TODO integrate with azure.ai sdk\n",
    "# currently we only support create connection in Azure ML Studio UI\n",
    "# raise Exception(f\"Please create {conn_name} connection in Azure ML Studio.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Run a flow with setting (context size 2K)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flow_path = \".\"\n",
    "data_path = \"./data/bert-paper-qna-3-line.jsonl\"\n",
    "\n",
    "config_2k_context = {\n",
    "    \"EMBEDDING_MODEL_DEPLOYMENT_NAME\": \"text-embedding-ada-002\",\n",
    "    \"CHAT_MODEL_DEPLOYMENT_NAME\": \"gpt-35-turbo\",\n",
    "    \"PROMPT_TOKEN_LIMIT\": 2000,\n",
    "    \"MAX_COMPLETION_TOKENS\": 256,\n",
    "    \"VERBOSE\": True,\n",
    "    \"CHUNK_SIZE\": 256,\n",
    "    \"CHUNK_OVERLAP\": 32,\n",
    "}\n",
    "\n",
    "column_mapping = {\n",
    "    \"question\": \"${data.question}\",\n",
    "    \"pdf_url\": \"${data.pdf_url}\",\n",
    "    \"chat_history\": \"${data.chat_history}\",\n",
    "    \"config\": config_2k_context,\n",
    "}\n",
    "\n",
    "run_2k_context = pf.run(\n",
    "    flow=flow_path,\n",
    "    data=data_path,\n",
    "    column_mapping=column_mapping,\n",
    "    runtime=runtime,\n",
    "    display_name=\"chat_with_pdf_2k_context\",\n",
    "    tags={\"chat_with_pdf\": \"\", \"1st_round\": \"\"},\n",
    ")\n",
    "pf.stream(run_2k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(run_2k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "detail = pf.get_details(run_2k_context)\n",
    "\n",
    "detail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Evaluate the \"groundedness\"\n",
    "The [eval-groundedness flow](../../evaluation/eval-groundedness/) is using ChatGPT/GPT4 model to grade the answers generated by chat-with-pdf flow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_groundedness_flow_path = \"../../evaluation/eval-groundedness/\"\n",
    "eval_groundedness_2k_context = pf.run(\n",
    "    flow=eval_groundedness_flow_path,\n",
    "    runtime=runtime,\n",
    "    run=run_2k_context,\n",
    "    column_mapping={\n",
    "        \"question\": \"${run.inputs.question}\",\n",
    "        \"answer\": \"${run.outputs.answer}\",\n",
    "        \"context\": \"${run.outputs.context}\",\n",
    "    },\n",
    "    display_name=\"eval_groundedness_2k_context\",\n",
    ")\n",
    "pf.stream(eval_groundedness_2k_context)\n",
    "\n",
    "print(eval_groundedness_2k_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Try a different configuration and evaluate again - experimentation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flow_path = \".\"\n",
    "data_path = \"./data/bert-paper-qna-3-line.jsonl\"\n",
    "\n",
    "config_3k_context = {\n",
    "    \"EMBEDDING_MODEL_DEPLOYMENT_NAME\": \"text-embedding-ada-002\",\n",
    "    \"CHAT_MODEL_DEPLOYMENT_NAME\": \"gpt-35-turbo\",\n",
    "    \"PROMPT_TOKEN_LIMIT\": 3000,  # different from 2k context\n",
    "    \"MAX_COMPLETION_TOKENS\": 256,\n",
    "    \"VERBOSE\": True,\n",
    "    \"CHUNK_SIZE\": 256,\n",
    "    \"CHUNK_OVERLAP\": 32,\n",
    "}\n",
    "\n",
    "column_mapping = {\n",
    "    \"question\": \"${data.question}\",\n",
    "    \"pdf_url\": \"${data.pdf_url}\",\n",
    "    \"chat_history\": \"${data.chat_history}\",\n",
    "    \"config\": config_3k_context,\n",
    "}\n",
    "run_3k_context = pf.run(\n",
    "    flow=flow_path,\n",
    "    data=data_path,\n",
    "    column_mapping=column_mapping,\n",
    "    runtime=runtime,\n",
    "    display_name=\"chat_with_pdf_3k_context\",\n",
    "    tags={\"chat_with_pdf\": \"\", \"2nd_round\": \"\"},\n",
    ")\n",
    "pf.stream(run_3k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(run_3k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "detail = pf.get_details(run_3k_context)\n",
    "\n",
    "detail"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_groundedness_3k_context = pf.run(\n",
    "    flow=eval_groundedness_flow_path,\n",
    "    runtime=runtime,\n",
    "    run=run_3k_context,\n",
    "    column_mapping={\n",
    "        \"question\": \"${run.inputs.question}\",\n",
    "        \"answer\": \"${run.outputs.answer}\",\n",
    "        \"context\": \"${run.outputs.context}\",\n",
    "    },\n",
    "    display_name=\"eval_groundedness_3k_context\",\n",
    ")\n",
    "pf.stream(eval_groundedness_3k_context)\n",
    "\n",
    "print(eval_groundedness_3k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pf.get_details(eval_groundedness_3k_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pf.visualize([eval_groundedness_2k_context, eval_groundedness_3k_context])"
   ]
  }
 ],
 "metadata": {
  "description": "A tutorial of chat-with-pdf flow that executes in Azure AI",
  "kernelspec": {
   "display_name": "prompt-flow",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.17"
  },
  "stage": "development"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
